In-Place Longest Common Extensions

نویسنده

Nicola Prezza

چکیده

Longest Common Extension (LCE) queries are a fundamental sub-routine in many stringprocessing algorithms, including (but not limited to) suffix-sorting, string matching, and identification of palindrome factors and repeats. A LCE query takes as input two positions i, j in a text T ∈ Σ and returns the length l of the longest common prefix between T ’s i-th and j-th suffixes. It is clear that we can store T in n⌈log2 |Σ|⌉ bits and answer LCE queries in O(l) time by direct comparison of the two suffixes. This solution has also the advantage of supporting optimal-time text extraction. In this paper, we prove the following (somehow surprising) result: in the RAM model, n⌈log2 |Σ|⌉ bits of space are sufficient to support deterministic O(log 2 l)-time LCE queries and optimaltime text extraction. LCE query times can be improved to O(log l) by adding only O(log n) words to the space usage. In other words, we can replace the (plain) text with a data structure of the same size supporting exponentially faster LCE queries without penalizing text extraction times. Importantly, our structure can be built in O(n log n) expected time and linear space, and is therefore also practical. By applying our techniques to the suffix sorting problem, we obtain (i) a novel in-place suffix array construction algorithm and (ii) the first efficient in-place solution for the sparse suffix sorting problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modifications of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays and Efficient RMQ computations

Approximate string matching is an important problem in Computer Science. The standard solution for this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic programming table and reaching space and running time in O(nk), wher...

متن کامل

A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays

Approximate string matching is an essential problem in many areas related to Computer Science including biological sequence processing. The standard solution of this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic progra...

متن کامل

Longest Common Extensions in Sublinear Space

The longest common extension problem (LCE problem) is to construct a data structure for an input string T of length n that supports LCE(i, j) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions i and j in T . This classic problem has a well-known solution that uses O(n) space and O(1) query time. In this paper we show that for any trade-of...

متن کامل

Using longest common subsequence and character models to predict word forms

This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially en...

متن کامل

Extensions of Some Fixed Point Theorems for Weak-Contraction Mappings in Partially Ordered Modular Metric Spaces

The purpose of this paper is to establish fixed point results for a single mapping in a partially ordered modular metric space, and to prove a common fixed point theorem for two self-maps satisfying some weak contractive inequalities.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1608.05100 شماره

صفحات -

تاریخ انتشار 2016

In-Place Longest Common Extensions

نویسنده

چکیده

منابع مشابه

Modifications of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays and Efficient RMQ computations

A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays

Longest Common Extensions in Sublinear Space

Using longest common subsequence and character models to predict word forms

Extensions of Some Fixed Point Theorems for Weak-Contraction Mappings in Partially Ordered Modular Metric Spaces

عنوان ژورنال:

اشتراک گذاری